Statistical Modelling of COVID-19 Outbreak in Italy

20 Apr 2020



Nonlinear growth models

Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.

By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.

Exponential

\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.

Logistic

\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).

Gompertz

\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.

The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.

Richards

\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.

Data

Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv

Source: https://github.com/pcm-dpc/COVID-19

url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)

Warnings

- 15/04/2020: dati Regione Friuli Venezia Giulia ricalcolati (ricalcolo di isolamento domiciliare e dimessi/guariti)
- 12/04/2020: dati P.A. Bolzano ricalcolati (ricalcolo dati guariti -110 rispetto a ieri).
- 10/04/2020: dati Regione Molise parziali (dato tamponi non aggiornato).
- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi


Modelling total infected

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$totale_casi,
                                    dy = reldiff(COVID19$totale_casi))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##         Estimate   Std. Error t value        Pr(>|t|)    
## th1 14994.832477  1800.696639   8.327 0.0000000000256 ***
## th2     0.046739     0.002496  18.723         < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17800 on 55 degrees of freedom
## 
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 0.000006995

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##         Estimate  Std. Error t value Pr(>|t|)    
## Asym 182505.2248   2401.3185   76.00   <2e-16 ***
## xmid     34.5838      0.3128  110.55   <2e-16 ***
## scal      7.6360      0.2078   36.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3810 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.0000004779

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
#            control = nls.control(maxiter = 1000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##            Estimate     Std. Error t value Pr(>|t|)    
## Asym 213901.7727687   1858.7458022  115.08   <2e-16 ***
## b2        8.9596130      0.1954790   45.83   <2e-16 ***
## b3        0.9336918      0.0008955 1042.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1372 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.0000007053

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "plinear", 
           control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging... 
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##          Estimate    Std. Error t value Pr(>|t|)    
## th1 224716.563052   2559.610301   87.79   <2e-16 ***
## th2      0.059056      0.001181   50.02   <2e-16 ***
## th3      6.356968      0.189742   33.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1347 on 54 degrees of freedom
## 
## Number of iterations to convergence: 3 
## Achieved convergence tolerance: 0.00739
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3, 
#             data = data, start = start, trace = TRUE)

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -637.7262 3 0.9321684 1281.4525 1281.9053 1287.5816
Logistic model -549.3269 4 0.9970040 1106.6539 1107.4231 1114.8261
Gompertz model -491.0959 4 0.9995541 990.1917 990.9610 998.3640
Richards model -490.0401 4 0.9996004 988.0802 988.8494 996.2524 ***
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 10000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 5000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(100,NA)) +
  labs(y = "Infected (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
                     minor_breaks = seq(0, max(ylim), by = 5000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date    fit    lwr    upr
## 58  2020-04-21 225551 177138 274244
## 581 2020-04-21 174382 165138 182393
## 582 2020-04-21 180908 177586 184444
## 583 2020-04-21 182096 178671 185826

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Infected", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling total deceased

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$deceduti,
                                    dy = reldiff(COVID19$deceduti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##        Estimate  Std. Error t value      Pr(>|t|)    
## th1 1385.521489  185.775603   7.458 0.00000000067 ***
## th2    0.053077    0.002734  19.412       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2210 on 55 degrees of freedom
## 
## Number of iterations to convergence: 13 
## Achieved convergence tolerance: 0.000003294

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 24464.7472   320.8056   76.26   <2e-16 ***
## xmid    37.2904     0.2823  132.09   <2e-16 ***
## scal     7.0715     0.1815   38.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 452.2 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000005596

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start, 
#            control = nls.control(maxiter = 10000))
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 29277.647523   246.754085   118.7   <2e-16 ***
## b2      12.600917     0.293038    43.0   <2e-16 ***
## b3       0.930523     0.000849  1096.0   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 152 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000002976

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##         Estimate   Std. Error t value Pr(>|t|)    
## th1 30358.238565   315.136269   96.33   <2e-16 ***
## th2     0.064976     0.001091   59.58   <2e-16 ***
## th3     9.641781     0.289109   33.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 151.6 on 54 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 0.0000003978

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -518.8139 3 0.9421361 1043.6278 1044.0806 1049.7570
Logistic model -427.8493 4 0.9976506 863.6986 864.4678 871.8708
Gompertz model -365.6899 4 0.9996906 739.3798 740.1491 747.5520
Richards model -365.5423 4 0.9997028 739.0845 739.8537 747.2567 ***
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Deceased (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 58  2020-04-21 30101 23736 36473
## 581 2020-04-21 23223 22049 24116
## 582 2020-04-21 24128 23743 24473
## 583 2020-04-21 24237 23838 24650

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Deceased", color = "Model") +
  scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Modelling recovered

# create data for analysis
data = data.frame(date = COVID19$data,
                  y = COVID19$dimessi_guariti,
                                    dy = reldiff(COVID19$dimessi_guariti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))

Estimation

Exponential

mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
## 
## Formula: y ~ exponential(x, th1, th2)
## 
## Parameters:
##        Estimate  Std. Error t value Pr(>|t|)    
## th1 1242.145522  116.084479   10.70 4.67e-15 ***
## th2    0.065967    0.001851   35.64  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2094 on 55 degrees of freedom
## 
## Number of iterations to convergence: 11 
## Achieved convergence tolerance: 0.000009463

Logistic

mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
## 
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Asym 65614.5750  2245.2100   29.22   <2e-16 ***
## xmid    48.2167     0.6943   69.44   <2e-16 ***
## scal     9.1178     0.2400   37.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 749.7 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.000001772

Gompertz

mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
## 
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
## 
## Parameters:
##          Estimate   Std. Error t value Pr(>|t|)    
## Asym 141003.27102   7534.79085   18.71   <2e-16 ***
## b2        8.51976      0.15547   54.80   <2e-16 ***
## b3        0.96425      0.00108  892.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 390.3 on 54 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 0.0000003524

Richards

richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss  <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2) 
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss, 
               y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
           # trace = TRUE, # algorithm = "port", 
           control = nls.control(maxiter = 1000))
summary(mod4)
## 
## Formula: y ~ richards(x, th1, th2, th3)
## 
## Parameters:
##          Estimate    Std. Error t value      Pr(>|t|)    
## th1 294726.102196  39976.868993   7.372 0.00000000102 ***
## th2      0.018380      0.001573  11.688       < 2e-16 ***
## th3      4.168223      0.161473  25.814       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 334.5 on 54 degrees of freedom
## 
## Number of iterations to convergence: 19 
## Achieved convergence tolerance: 0.000002674

Models comparison

models = list("Exponential model" = mod1, 
              "Logistic model" = mod2, 
              "Gompertz model" = mod3,
              "Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
                 df = sapply(models, function(m) attr(logLik(m), "df")),
                 Rsquare = sapply(models, function(m) 
                                  cor(data$y, fitted(m))^2),
                 AIC = sapply(models, AIC),
                 AICc = sapply(models, AICc),
                 BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
                 cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)
loglik df Rsquare AIC AICc BIC
Exponential model -515.7289 3 0.9848721 1037.4578 1037.9107 1043.5870
Logistic model -456.6637 4 0.9979485 921.3273 922.0966 929.4995
Gompertz model -419.4481 4 0.9993663 846.8962 847.6655 855.0684
Richards model -410.6614 4 0.9995213 829.3229 830.0921 837.4951 ***
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(aes(y = fitted(mod1), color = "Exponential")) +
  geom_line(aes(y = fitted(mod2), color = "Logistic")) +
  geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
  geom_line(aes(y = fitted(mod4), color = "Richards")) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_color_manual(values = cols) +
  scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
                     minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

last_plot() +
  scale_y_continuous(trans = "log10", limits = c(10,NA)) +
  labs(y = "Recovered (log10 scale)")

Predictions

Point estimates

df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
               fit1 = predict(mod1, newdata = df),
               fit2 = predict(mod2, newdata = df),
               fit3 = predict(mod3, newdata = df),
               fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() + 
  geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
  geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
  geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
  geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
  coord_cartesian(ylim = ylim) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Prediction intervals

# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))

pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]

pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]

pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]

pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]

# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
             subset(pred2, x == max(data$x)+1, select = 2:5),
             subset(pred3, x == max(data$x)+1, select = 2:5),
             subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
##           date   fit   lwr   upr
## 58  2020-04-21 56994 50881 63470
## 581 2020-04-21 48894 46554 50695
## 582 2020-04-21 50278 49217 51229
## 583 2020-04-21 50726 49787 51539

ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))
ggplot(data, aes(x = date, y = y)) + 
  geom_point() +
  geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
  geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
  geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
  geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
  geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
  geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr), 
              inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
  geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
  geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
              inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
  coord_cartesian(ylim = c(0, max(ylim))) +
  labs(x = "", y = "Recovered", color = "Model") +
  scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
                     minor_breaks = seq(0, max(ylim), by = 1000)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = cols) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Description of evolution

Positive cases and administered swabs

df = data.frame(date = COVID19$data,
                positives = c(NA, diff(COVID19$totale_casi)),
                swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )
ggplot(df, aes(x = date)) + 
  geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
  geom_line(aes(y = swabs, color = "swabs")) +
  geom_point(aes(y = positives, color = "positives"), pch = 0) +
  geom_line(aes(y = positives, color = "positives")) +
  labs(x = "", y = "Number of cases", color = "") +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  scale_color_manual(values = palette()[c(2,1)]) +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = y)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col=palette()[4]) + 
  geom_line(size = 0.5, col=palette()[4]) +
  labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
  scale_y_continuous(labels = scales::percent_format(),
                     breaks = seq(0, 0.5, by = 0.05)) +
  coord_cartesian(ylim = c(0,max(df$y, na.rm = TRUE))) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

Hospitalized and ICU patients

df = data.frame(date = COVID19$data,
                hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
                icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
ggplot(df, aes(x = date, y = hospital)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "orange") + 
  geom_line(size = 0.5, col = "orange") +
  labs(x = "", y = "Change hospitalized patients") +
  coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(min(df$hospital, na.rm = TRUE),
                                        max(df$hospital, na.rm = TRUE), 
                                        by = 100)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))

ggplot(df, aes(x = date, y = icu)) + 
  geom_smooth(method = "loess", se = TRUE, col = "black") +
  geom_point(col = "red2") + 
  geom_line(size = 0.5, col = "red2") +
  labs(x = "", y = "Change ICU patients") +
  coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
  scale_y_continuous(minor_breaks = seq(min(df$icu, na.rm = TRUE), 
                                        max(df$icu, na.rm = TRUE), 
                                        by = 10)) +
  scale_x_date(date_breaks = "2 day", date_labels =  "%b%d",
               minor_breaks = "1 day") +
  theme_bw() +
  theme(legend.position = "top",
        axis.text.x = element_text(angle=60, hjust=1))